New Cases:
- I pooled the estimated effects and standard errors across groups for
each quarter in each state for the models below - they look a little off
in some places, so I need to go back and check them out.
Model 1: Random intercept
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k = 27) + s(time, bs = "tp", k = 10)
+ age_cat + hiv_status + sex,
data = mdf_new_grp,
family = binomial (link = "logit"),
method = "REML")

Model 2: Random slope
gam(cbind(Positive, Negative) ~ s(time, bs = "tp", k = 10) + s(time, state, bs = "re", k = 27)
+ age_cat + hiv_status + sex,
data = mdf_new_grp,
family = binomial (link = "logit"),
method = "REML")
Model 3: Random intercept + Random slope
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k = 27) + s(time, state, bs = "re", k = 27)
+ age_cat + hiv_status + sex,
data = mdf_new_grp,
family = binomial (link = "logit"),
method = "REML")
Note: For the 95% CIs, the Y axes vary for each state to capture the
state-specific variation over time.

Same plot as above, just with fixed Y-axis.

Model 4: GI - Random intercept + group-level smooths with varying
smoothness
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k=27) + s(time, bs = "tp", k = 10)
+ s(time, by = state, bs = "tp", m=1, k = 10) + age_cat + hiv_status + sex,
data = mdf_new_grp,
family = binomial (link = "logit"),
method = "REML")

Model 5: GS - Global smooth + group-level smooths with same
smoothness
gam(cbind(Positive, Negative) ~ s(time, bs = "tp", m=2)
+ s(time, state, bs = "fs", m = 2) + age_cat + hiv_status + sex,
data = mdf_new_grp,
family = binomial (link = "logit"),
method = "REML")
Compare models

I also fit the above without the number tested to get a better sense
of how the modeled trends compare to each other

Re-Entry:
Combined
I figured I would present these all together since each model is
still noisy, especially in states where there are no rif cases
(e.g. Acre, Roraima, and Tocantins). I quickly looked into potential
inclusion criteria for percent tested, which would help address some of
these issues, especially we define the inclusion criteria as
consistently testing > 5% of case population for at least two
consecutive periods. This would largely exclude 2014-2015. I think we
would want to add a threshold to the minimum number of cases tested,
since some state who test a few folks, but who have only a few cases
overall would exceed the threshold.


---
title: "RR-TB Updates - Aug. 31"
output:
  pdf_document: default
  html_notebook: default
  html_document:
    df_print: paged
editor_options:
  chunk_output_type: inline
---
## Updates
* Updated HIV values (to prioritize test result, then self-report)
* Ran models for relapsed and re-entry cases; Played with K settings in smooths (effect is pretty marginal)
* Health Unit: 
  * Not added to models yet
  * I noticed there was a lot of missingness in Sao Paulo between 2018-2019 (it looks like it jumps from a few hundred missing cases to >20k, and then back down in 2020) -- I'm double checking whether this is an issue on my end or worth asking the Brazil team about

* Finalized list of potential individual and municipal-level covariates to add in and found data sources: 
  * Urbanicity, defined as proportion of municipality pop living in urban setting
    * Source: IBGE - 2010 census
  * Bolsa Familia Coverage 
    * Source: Baroni et al., 2021 paper - Database on the coverage of the “Bolsa-Família” conditioning cash-transfer program: Brazil, 2005 to 2021)
  * FHS Coverage, defined as number of people per municipality registered in 2015
    * Source: DATASUS 
  * Household crowding, Defined as percent of household with > 2 people living in a sleeping area
    * Source: 2010 Census, via Atlas BR
  * Treatment abandonment rate per municipality in period quarter (or year)
    * Source: Calculate from Sinan







# New Cases: 
* I pooled the estimated effects and standard errors across groups for each quarter in each state for the models below - they look a little off in some places, so I need to go back and check them out. 




### Model 1: Random intercept
```{r new_m1, include=TRUE, eval = FALSE, out.width= '100%'}
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k = 27) + s(time, bs = "tp", k = 10) 
    + age_cat + hiv_status + sex, 
             data = mdf_new_grp, 
             family = binomial (link = "logit"), 
             method = "REML")
```


<!-- Variance of the random effects:  -->
<!-- ```{r, echo=FALSE} -->
<!-- library(mgcv) -->
<!-- load("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/output/fits/gam_new_intercept.rda") -->
<!-- variance_comp(new_intercept) -->
<!-- ``` -->

```{r echo=FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m1_plot.png"))
```



### Model 2: Random slope 
```{r new_m2, include=TRUE, eval = FALSE, out.width= '100%'}
gam(cbind(Positive, Negative) ~ s(time, bs = "tp", k = 10) + s(time, state, bs = "re", k = 27) 
    + age_cat + hiv_status + sex, 
                           data = mdf_new_grp, 
                           family = binomial (link = "logit"), 
                           method = "REML")
```

<!-- Variance of the random effects:  -->
<!-- ```{r, echo=FALSE} -->
<!-- load("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/output/fits/gam_new_slope.rda") -->
<!-- variance_comp(new_slope) -->
<!-- ``` -->

```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m2_plot.png"))
```





### Model 3: Random intercept + Random slope

```{r, include = TRUE, eval = FALSE, out.width= '100%'}
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k = 27) + s(time, state, bs = "re", k = 27) 
    + age_cat + hiv_status + sex, 
              data = mdf_new_grp, 
              family = binomial (link = "logit"), 
              method = "REML")
```

<!-- Variance of the random effects:  -->
<!-- ```{r, echo = FALSE} -->
<!-- load("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/output/fits/gam_new_intercept_slope.rda") -->
<!-- variance_comp(new_intercept_slope) -->
<!-- ``` -->


Note: For the 95% CIs, the Y axes vary for each state to capture the state-specific variation over time. 
```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m3_plot.png"))
```

Same plot as above, just with fixed Y-axis. 
```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m3_plot_fixed.scales.png"))
```



### Model 4: GI - Random intercept + group-level smooths with varying smoothness

```{r, include = TRUE, eval = FALSE}
gam(cbind(Positive, Negative) ~ s(state, bs = "re", k=27) + s(time, bs = "tp", k = 10) 
    + s(time, by = state, bs = "tp", m=1, k = 10) + age_cat + hiv_status + sex, 
                           data = mdf_new_grp, 
                           family = binomial (link = "logit"), 
                           method = "REML")

```

<!-- Variance of the random effects:  -->
<!-- ```{r, echo = FALSE} -->
<!-- load("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/output/fits/gam_new_gi.rda") -->
<!-- variance_comp(new_gi) -->
<!-- ``` -->


```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m4_plot.png"))
```






### Model 5: GS - Global smooth + group-level smooths with same smoothness

```{r, include = TRUE, eval = FALSE, out.width= '100%'}
gam(cbind(Positive, Negative) ~ s(time, bs = "tp", m=2) 
    + s(time, state, bs = "fs", m = 2) + age_cat + hiv_status + sex, 
              data = mdf_new_grp, 
              family = binomial (link = "logit"), 
              method = "REML")
```

<!-- Variance of the random effects: I'm assuming there are three variance components for FS because FS is defined by three basis functions.  -->
<!-- ```{r, echo = FALSE} -->
<!-- load("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/output/fits/gam_new_gs.rda") -->
<!-- variance_comp(new_gs) -->
<!-- ``` -->

```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_m5_plot.png"))
```



## Compare models 
```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot.png"))
```
I also fit the above without the number tested to get a better sense of how the modeled trends compare to each other


```{r, echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot_trend.only.png"))
```







# Re-Entry: 
## Combined 
I figured I would present these all together since each model is still noisy, especially in states where there are no rif cases (e.g. Acre, Roraima, and Tocantins). I quickly looked into potential inclusion criteria for percent tested, which would help address some of these issues, especially we define the inclusion criteria as consistently testing > 5% of case population for at least two consecutive periods. This would largely exclude 2014-2015. I think we would want to add a threshold to the minimum number of cases tested, since some state who test a few folks, but who have only a few cases overall would exceed the threshold. 






```{r echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot_reentry_fixed.axes.png"))
```




```{r echo=FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot_reentry_trends.only.png"))
```




# Relapse
## Combined: 
```{r echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot_relapse.png"))
```


```{r echo = FALSE, out.width= '100%'}
knitr::include_graphics(here::here("~/Dropbox (Personal)/Research/Brazil TB/Rif_Project/figures/gam_all_plot_relapse_trends.only.png"))
```

